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, The forecasting problem for a stationary and ergodic binary time series {^n} 

is to estimate the probabihty that Xn+i = 1 based on the observations Xi, < i < n 

■ without prior knowledge of the distribution of the process It is known that this 
p ^ ! is not possible if one estimates at all values of n. We present a simple procedure which 

I will attempt to make such a prediction infinitely often at carefully selected stopping 

■ times chosen by the algorithm. We show that the proposed procedure is consistent 
under certain conditions, and we estimate the growth rate of the stopping times. 
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1 Introduction 



T. Cover [3] posed two fundamental problems concerning estimation for stationary and er- 
godic binary time series {-^n}^-oo- (Note that a stationary time series {X„}^q can be 
extended to be a two sided stationary time series {Xn}^=-oo-) Cover's first problem was on 
backward estimation. 

Problem 1 Is there an estimation scheme fn+i for the value P{Xi — . . . ,Xo) such 

that fn+i depends solely on the observed data segment . . . , ^o) o-i^d 

jim . . . , Xo) - P(Xi = . . . , Xo)| = 

almost surely for all stationary and ergodic binary time series {Xn}'^^_^ ? 

This problem was solved by Ornstein [13] by constructing such a scheme. (See also Bailey [2].) 
Ornstein's scheme is not a simple one and the proof of consistency is rather sophisticated. 
A much simpler scheme and proof of consistency were provided by Morvai, Yakowitz, Gyorfi 
[12]. (See also Weiss [18].) 

Cover's second problem was on forward estimation (forecasting). 

Problem 2 Is there an estimation scheme fn+i for the value P(X„+i = i\Xo, . . . , Xn) such 
that fn+i depends solely on the data segment {Xq, . . . , X^) and 

\im |/„+i(Xo, ...,X^)- P(X„+i = l|Xo, . . . ,X„)| = 

almost surely for all stationary and ergodic binary time series ? 

This problem was answered by Bailey [2] in a negative way, that is, he showed that there is 
no such scheme. (Also see Ryabko [16], Gyorfi, Morvai, Yakowitz [7] and Weiss [18].) Bailey 
used the technique of cutting and stacking developed by Ornstein [14] (see also Shields 
[17]). Ryabko's construction was based on a function of an infinite state Markov-chain. This 
negative result can be interpreted as follows. Consider a market analyst whose task it is to 
predict the probability of the event 'the price of a certain share will go up tomorrow' given 
the observations up to the present day. Bailey's result says that the difference between the 
estimate and the true conditional probability cannot eventually be small for all stationary 
and ergodic market processes. The difference will be big infinitely often. These results show 
that there is a great difference between Problems 1 and 2. Problem 1 was addressed by 
Morvai, Yakowitz, Algoet [11] and a very simple estimation scheme was given which satisfies 
the statement in Problem 1 in probability instead of almost surely. However, for the class 
of all stationary and ergodic binary Markov-chains of some finite order Problem 2 can be 
solved. Indeed, if the time series is a Markov-chain of some finite (but unknown) order, we 
can estimate the order (e.g. as in Csiszar, Shields [5]) and count frequencies of blocks with 
length equal to the order. 

Let X*~ be the set of all one-sided binary sequences, that is, 

X*~ = {(..., a;_i, Xq) : Xi G {0, 1} for all -oo < i < 0}. 
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Let d{-,-) be the Hamming distance (that is for x,y E {0,1}, d{x,y) = if and only if 
X — y and d{x,y) — 1 otherwise), and define the distance on sequences (. . . ,x_i,Xo,) and 
(. . . , yo) as follows. Let 

oo 

d*{{. . . , xo), (. . . , y-u yo)) = J2 '^~'~^d{x-i, y-i). (1) 

i=0 

(For details see Gray [6] p. 51. ) 

Definition 1 The conditional probability P{Xi — 1\ . . . , X^i, Xq) is almost surely con- 
tinuous if for som,e set C C X*~ which has probability one the conditional probability 
P{Xi = 1| . . . ,X_i,Xq) restricted to this set C is continuous with respect to metric d*{-, ■) 
in (1). 

Wc note that from the proof of Ryabko [16] and Gyorfi, Morvai, Yakowitz [7] it is clear 
that even for the class of all stationary and ergodic binary time-series with almost surely 
continuous conditional probability P{Xi = 1| . . . , Xq) one can not solve Problem 2. 
For n > 1, let the function Pn{-) be defined as 

Pn{X-n+li ■ ■ ■ , Xq) = P{X-n+l = X-n+1, ■ ■ ■ , Xq = Xq) (2) 

where x^i G {0, 1} for < i < n — 1. 

The entropy rate H associated with a stationary binary time-series {-^n}-oo is defined a.s H = 
lim„_^oo — ^-E'log2Pn(-'^-n+i, ■ ■ ■ , -'^-i, -'^o)- We note that the entropy rate of a stationary 
binary time-series always exists. For details cf. Cover, Thomas [4], pp. 63-64. 

Now we may pose our problem. 

Problem 3 Is there a sequence of strictly increasing stopping times {A„} with 

An < 2"^^+^) 

and an estimation scheme fn{Xo, . . . , Xx„) which depends on the observed data segment 
{Xq, . . . , Xx„) such that 

jim |/„(Xo, . . . , - P(X,„+i = l|Xo, . . . , X,J\ = 

almost surely for all stationary and ergodic binary time series {-^n}^-oo ^^^^ almost surely 
continuous conditional probability P{Xi = 1| . . . , Xq) ? 

It turns out that the answer is affirmative and such a scheme will be exhibited below. This 
result can be interpreted as if the market analyst can refrain from predicting, that is, he 
may say that he docs not want to predict today, but will predict at infinitely many time 
instances, and not too rarely, since A„ < 2"(-^+'^\ and the difference between the prediction 
and the true conditional probability will vanish almost surely at these stopping times. We 
note that the stationary processes with almost surely continuous conditional distribution 
generalize the processes for which the conditional distribution is actually continuous, these 
are essentially the Random Markov Processes of Kalikow [8], or the continuous g-measures 
studied by Mike Keane in [9]. Morvai [10] proposed a different estimator which is consistent 
on a certain stopping time sequence, but those stopping times grow like an exponential tower 
which is unrealistic and much faster growth than the mere exponential one in Problem 3. 
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2 The Proposed Estimator 

Let {^n}J^L-oo be a stationary time series taking values from a binary alphabet X = {0, 1}. 
(Note that all stationary time series {Xn}^=o can be thought to be a two sided time series, 
that is, ) Now we exhibit an estimator which is consistent on a certain stopping 

time sequence for a restricted class of stationary time series. For notational convenience, let 

X;^ = {Xm, . . . , Xn), where m <n. 

Define the stopping times as follows. Set (^o = 0. For /c = 1, 2, . . ., define sequence 77^ and Cfe 
recursively. Let 

One denotes the kth. estimate of P(X^j,+i = l|-^o*) ^y Qk-, and defines it to be 

fe-i 

^fc = rE^c.+i- (3) 

j=Q 

It will be useful to define other processes {Xn}n=-oo {^n^}'?^=-oo for A; > as follows. 
Let 

X^n = ^Cn-n ^ > 0, and X^^^ = X^+n for -00 < n < 00. (4) 

For an arbitrary stationary binary time series {Yn}, and for all /c > 1 and 1 < i < k define 
C^iY^J = and 

VKY^J = min{t > : y|;^_,_, = lf^^(,_,} 

and 

When it is obvious on which time series fj^{Y^^) and Ci(Y-oo) evaluated, we will use the 
notation fjf and Q. Let T denote the left shift operator, that is, {Tx^^)i — Xi+i. It is easy 
to see that if Cfc(a^-oo) = ^ then C^(T'x!°^) = -I. 
We will need the next lemma for later use. 

Lemma 1 Let {X„}5^L-oo ^ stationary binary process. Then the time series {X^^}'^^_^, 
{Xn}n=_oo ^'^^ {-^n}'^^=-oo have identical distribution. Thus all these time series are sta- 
tionary, and {X„}°^_^ can be thought to be two sided stationary time series {X„}^_^. 

Let k > 0, n > 0, m > 0, x^_^ e be arbitrary. It is immediate that for / > 0, 
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First we prove that for A; > 0, PdxSL X^^) = i^m-n, ■ ■ ■ , x^)) = = x^_J. 

By the construction in (4), the stationarity of the time series and (5) we have 

-P{{-^m—ni • • • ? -^m^) {^m—ni ■ ■ ■ ■) ^m)) 

oo 

1=0 

oo 

Now we prove that = a;°„) = P(X°„ = a;°„). By the construction in (4), the 

stationarity of the time scries and (5) (with m = 0) we have 



oo 



/=0 
oo 



The proof of the Lemma is complete. 

Now we show the consistency of our estimate Qk defined in (3). 

Theorem 1 Let {X„} be a stationary binary time series. For the estimator defined in (3), 

hm gk — P{X(^i^+i — 1\Xq'') —0 almost surely 

fc— >oo 

provided that the conditional probability P{Xi = llX^^^) is almost surely continuous. More- 
over, under the same conditions, 

hm Qk = hm = = P{Xi = l\X^^) almost surely. 

k—KX> fe— >oo 

RecaUing (3) we can write 

9k = lj:[x^,+i-Pix^,+i^i\x^Joo)]+lj:p(^c^+i-Mx^Joo) 

1^ i=0 ^ j=0 

= ^Er, + ^EP(Xc,.+i = i|x5^^). (6) 

Observe that {Tj, a{X^J^^)} is a bounded martingale difference sequence for < j < oo. To 
see this notice that a(X^^^) is monotone increasing, and Fj is measurable with respect to 
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a{X^J^^), and E{rj\X':^^''^') = for < j < oo (where you may define C-i = -1)- Now 
apply Azuma's exponential bound for bounded martingale differences in Azuma [1] to get 
that for any e > 0, 



-0-1+1 



P 



k-l 



> e < 2exp(-e^A;/2). 



After summing the right hand side over k, and appealing to the Borel-Cantelli lemma for a 
sequence of e's tending to zero we get | J2^Zo ~^ almost surely. 
Define the function p : X*' ^ [0, 1] as = P{Xi = 1\X^^ = x°_^). 

For arbitrary j > 0, by the construction in (4), 



X, 



(X^) . . , X^^) = 1°, and hm (. . . , X^^)) = 



(7) 



almost surely. By assumption, the function p{-) is continuous on a set C C X* with 
P{X^_^ e C) = 1, and by the Lemma, P(XO^ e C) = 1, and for each j > 0, P((. . . , xi^], X^^'^) e 
C) = 1, and finally, 

PiX'_^ G C, (. . . , Xi^)) e C for aU j > 0) = 1. 
By the Lemma, the construction in (4), the continuity of p{-) on the set C, and by (7) 
P(Xc,+i = IIX^'^) = p(. . . , X(^)) ^ p(X^^) = P(Xi = 1|X^^) 

and |EJ=o^(^C,+i = l|^-oo) ^ ^(^i = l|^-oo) almost surely. We have proved that 
gk P{Xi = 1|X°qq) almost surely. 

Now observe that by (1) and the continuity of p{-) on the set C, almost surely, for all e > 0, 
there is a J(e, X^^), such that for aU z^^ G C, if £j = X° j then \p{z^_^) - p{X^_^)\ < e. 
By (7), and since e > was arbitrary, almost surely, 

hm P(X^.+i = 1|XS0 = lim^{P(X^^.+i = l|x5'J|Xo^^} 



= iim£;{p(x5'^)|xj^} 



j->oo 

= P{X'_J = P{X, = l\X'_J. 

The proof of Theorem 1 is complete. 

Remcirk. We note that for all stationary binary time-series, the estimation scheme described 
above is consistent in probability. This may be seen as follows: 



E 



9k - 
< E 
1 



P(Xc,+i = llXo^'^) 

k-l 



-E[^C,+l-^(^C,+l = l|^-oo)] 



k-l 



+ \p{x? = i| . . . , x^), x(^)) - p{x? = i|x^], . . . , xi^-)) 



+ E 



3=0 

k-l 

k 



y: p(xf ) = i|xi7, . . . , xD - Pixr = i\x^t\ ■ ■ ■ , xn 

j=0 



.(k) 
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where we used (7) and the Lemma. The first term converges to zero since X^^^+i— P{X^.+i — 

1\X^^) is a martingale difference sequence with respect to a{X^^^) and an average of 
bounded martingale differences converges to zero almost surely cf. Azuma [1]. Applying (4), 
(7) and the Lemma, the sum of the last two terms can be estimated by the sum 

^ J2e\p{Xi = 1\X'_J - P{X, = 1\X'_. 



+ E 



Ij:p{x^^i\x'_,)-p{x,^i\x%) 



and both terms converge to zero since by the martingale convergence theorem limj^^ P(Xi = 
l\X^j) = P{Xi = 1\X^^) almost surely, and thus the limit in fact exists and equals zero. 

Next we will give some universal estimates for the growth rate of the stopping times (k in 
terms of the entropy rate of the process. This is natural since the Cfe are defined by recurrence 
times for blocks of length k, and these are known to grow exponentially with the entropy 
rate. (Cf. Ornstein and Weiss [15].) 

Theorem 2 Let {X^} be a stationary and ergodic binary time series. Then for arbitrary 
€ > 0, 

Ck < 2^^^^'^^ eventually almost surely, 
where H denotes the entropy rate associated with time series 

Let X* be the set of all two-sided binary sequences, that is, 

X* — {{... , X-i, xo, xi, . . .) : G {0, 1} for all — oo < i < oo}. 

Define Bk C {0, l}'^ as 

Bu = {x\^, e {0, l}'^ : 2-'(^^'-'^^ < Pk{x'_,^,)}, 

where Pfc(-) is as in (2). Note that there is a trivial bound on the cardinality of the set B^, 
namely, 

\Bk\< 2'=(^+o-5^). (8) 

By the Lemma, the distribution of the time series {X^} is the same as the distribution of 
{Xn} and by the Shannon- McMillan- Breiman Theorem (cf. Cover, Thomas [4], p. 475), 

u ni^vei?.}] =L (9) 

\fe=l i>k ) 

Define the set Qfed/^fe+i) as follows: 
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We will estimate the probability of QkiV-k+i) by means of the ergodic theorem. Let 
X* be a typical sequence of the time series Define Q;o(y°fc+i) = and for i > 1 let 

ai{y-k+i) = min{/ > a,_i(|/Vi) : T'^^'^^oo e QkiV-k+i)]- 
Define also /3o(|/-a;+i) = and for i > 1 let 

= min{z > + a'^^^^^) : T-'x^^ e Qk{y\+,)}. 

Observe that for arbitrary Z > 0, 

oo 

E l{A-l(!/°fc+i)<"^(2/°,+i)<A(2/°,+i)} < ^ + 1- 

By the Lemma and the ergodicity of the time series {^n}, 

P((. . . . . .) e g,(i/Vi)) = ^(^-oo e g.(i/Vi)) 



t oo 



1 



& ^ S ,+i)<a.{j/° fc+i)<A{j/° fc+i)} 



< hm^i^^i^ (10) 

By the construction in (4), -4^(. . . = U^^). and (xi'^+i, • • • = X^.+i 

and by the upper bound on the cardinality of set Bk in (8) and by (10), we get 

P(a(Xo°°)>2'=(^+^),xViei?,) 

= P(-4^. . . ) > 2'^(^+^),xVi e Bk) 

= P(-a(. . . , xS\ #)) > 2'^(^+^), (xi'2^„ . . . , xlt^) e P,) 

= ^ P((...,x5\xi^\xf\...) e g,(yVi)) < {k + l)2-''-''. 

The right hand side sums, the Borel-Cantelli Lemma and the Shannon-McMillan-Breiman 
Theorem in (9) together yield that (k < 2 eventually almost surely and Theorem 2 is 
proved. 
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